Submitted to Ieee Transactions on Pattern Analysis and Machine Intelligence 2

نویسندگان

  • Arnau Ramisa
  • Fei Yan
  • Francesc Moreno-Noguer
چکیده

Building upon recent Deep Neural Network architectures, current approaches lying in the intersection of Computer Vision and Natural Language Processing have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these learning methods, though, rely on large training sets of images associated with human annotations that specifically describe the visual content. In this paper we propose to go a step further and explore the more complex cases where textual descriptions are loosely related to the images. We focus on the particular domain of news articles in which the textual content often expresses connotative and ambiguous relations that are only suggested but not directly inferred from images. We introduce an adaptive CNN architecture that shares most of the structure for multiple tasks including source detection, article illustration and geolocation of articles. Deep Canonical Correlation Analysis is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (such as GPS coordinates and user comments). We show this dataset to be appropriate to explore all aforementioned problems, for which we provide a baseline performance using various Deep Learning architectures, and different representations of the textual and visual features. We report very promising results and bring to light several limitations of current state-of-the-art in this kind of domain, which we hope will help spur progress in the field.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rotation Invariance

This chapter discusses the issue of rotational invariance of a texture analysis system: i.e. one desires that the outcome of the analysis is not aaected by the orientation of the input image. We argue that the orthogonal DWT (section 3.4) is very impractical for such an analysis due to its separable nature in 2 dimensions. We therefore employ the non-separable wavelet frames (section 3.3). We d...

متن کامل

A method for objective edge detection evaluation and detector parameter selection IEEE : Transactions on Pattern Analysis & Machine Intelligence ( 2003 )

A method for objective edge detection evaluation and detector parameter selection IEEE: Transactions on Pattern Analysis & Machine Intelligence (2003) Yitzhak Yitzhaky and Eli Peli Additional information and results

متن کامل

Ieee Transactions on Pattern Analysis and Machine Intelligence

The role of moments in image normalization and invariant pattern recognition is addressed. The classical idea of the principal axes is analyzed and extended to a more general definition. The relationship between moment-based normalization, moment invariants, and circular harmonics is established. Invariance properties of moments, as opposed to their recognition properties, are identified using ...

متن کامل

Photorealistic Monocular Gaze Redirection Using Machine Learning.

We propose a general approach to the gaze redirection problem in images that utilizes machine learning. The idea is to learn to re-synthesize images by training on pairs of images with known disparities between gaze directions. We show that such learning-based re-synthesis can achieve convincing gaze redirection based on monocular input, and that the learned systems generalize well to people an...

متن کامل

Fixed Points of Belief Propagation - An Analysis via Polynomial Homotopy Continuation.

Belief propagation (BP) is an iterative method to perform approximate inference on arbitrary graphical models. Whether BP converges and if the solution is a unique fixed point depends on both the structure and the parametrization of the model. To understand this dependence it is interesting to find all fixed points.

متن کامل

2 Ieee Transactions on Pattern Analysis and Machine Intelligence

We present a novel approach to reliable and eecient recovery of part-descriptions in terms of su-perquadric models from range data. We show that superquadrics can directly be recovered from unseg-mented data, thus avoiding any pre-segmentation steps (e.g., in terms of surfaces). The approach is based on the recover-and-select paradigm 10]. We present several experiments on real and synthetic ra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017